The Gene Identification Problem from a Hypotheses Test Perspective

نویسندگان

  • Mireia Vilardell
  • Alex Sánchez
چکیده

Many gene identification methods assign scores to gene elementsas a previous step to their assembly in predicted genes. The scoring system is often based on log likelihood ratios (LLRs) whose meaning is somehow different from the usual likelihood ratio tests that appear in many statistical problems. In the first part of this work we have tried to give an interpretation of the statistical meaning of LLRs based scoring systems. We have developed several tests of significance for the scores: the " Sum-of-Scores test " (SSt), based on the straightforward score obtained by the programs, the " Intersection-Union test " (IUt) based on a multiple hypothesis testing interpretation of an exon's score and several meta-analytical approaches which combines p-values corresponding to the exon's parts. We have performed simulation studies to analyze the performance of these tests. Whereas SSt and IUt tests are appealing from the statistical point of view the meta-analytic approach has proved to have a much better sensitivity and specificity which suggests they may be incorporated in actual gene prediction methods as a com-plimentary " probabilistic " score. To approximate the distribution of the tests under the null hypothesis (of non-exonicity) two approaches are possible: to resample from real " non-coding-no sites " data or to simulate them using the scoring model and the estimated probability matrices. Although both approaches should be, in principle, equivalent it is interesting to compare them in order to check the validity of the scoring system used by GeneId. A comparison between both approaches has been performed in the second part of this work using Monte Carlo simulation. The main conclusion is that using probability matrices and the scoring system is essentially equivalent to resampling only in the case of sites (start, acceptor donor or stop). In the case of coding potential they are not always, equivalent and in general resampling is preferred to simulation. In order to perform the analysis and tests described a java program has been implemented. It can be used in several ways: (1) It can take GeneId output and use it to perform some or all tests of exonicity. (2) It can be used to generate samples of true or false exons (partial or complete) that can be used as input for other studies such as testing gene prediction programs .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Effects of Brand Identity on Customer Loyalty from Social Identity Perspective

A key challenge to brand managers is how to gain a better understanding of the relationship between brand and customer loyalty constructs. Researchers have recognized that brand identity plays a key role in brand management. The purpose of this paper is to investigate the effect of brand identity and brand identification on brand loyalty through perceived value, customer satisfaction, and trust...

متن کامل

Identification of Public Policy Implementation Barriers (Health System Reform Plan)

Background & Aims of the Study: One of the main concerns of policymakers is the establishment of health equity and access to health services. Health system reform plan is one of the most important tools of policy makers in the health system. The health system, as any other plan has had some problems during implementation. Therefore, the purpose of this study is to identify the obstacles to the ...

متن کامل

Isolation and identification of Mycoplasma gallisepticum in chickensbn from industrial farms in Kerman province

Mycoplasma gallisepticum is the most important and infectious Mycoplasmosis. It is caused lots of economic losses for poultry's industry of Iran. The target of this study is comparison of culture and nested PCR techniques to detect Mycoplasma gallisepticum infection of chicken’s from industrial farms in Kerman province of Iran. 88 isolates received from industrial far...

متن کامل

Isolation and identification of Mycoplasma gallisepticum in chickensbn from industrial farms in Kerman province

Mycoplasma gallisepticum is the most important and infectious Mycoplasmosis. It is caused lots of economic losses for poultry's industry of Iran. The target of this study is comparison of culture and nested PCR techniques to detect Mycoplasma gallisepticum infection of chicken’s from industrial farms in Kerman province of Iran. 88 isolates received from industrial far...

متن کامل

Testing fuzzy hypotheses with vague data

The problem of testing fuzzy hypotheses in the presence of vague data is considered. A new method based on the necessity index of strict dominance (NSD) is suggested. An example hoe to apply the proposed test in statistical quality control is shown.

متن کامل

Barriers to Medication Error Reporting from Nurses’ Perspective: A Private Hospital Survey

Background and Objectives: Not reporting medication errors by the clinical staff prevent identification of type and frequency of these errors, and thereby developing effective strategies to alleviate the problem. Most investigation of barriers to medication error reporting comes from public hospitals. This study aimed to explore the issue from a nurse’s perspective in private hospital. Methods...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004